Fluctuations of the Empirical Entropies of a Chain of Infinite 

Order * 



Davide Gabrielli ^ 
Universita Dell'Aquila 



Antonio Galves ^ 
Universidade de Sao Paulo 



Daniela Guiol § 
Universidade de Sao Paulo 



October 28, 2003 



Abstract 



This paper addresses the question of the fluctuations of the empirical entropy of a chain 
of infinite order. We assume that the chain takes values on a finite alphabet and loses mem- 
ory exponentially fast. We consider two possible definitions for the empirical entropy, both 
based on the empirical distribution of cylinders with length clogn, where n is the size of the 
sample and c is a suitable constant. The first one is the conditional entropy of the empirical 
distribution, given a past with length growing logarithmically with the size of the sample. The 
second one is the rescaled entropy of the empirical distribution of the cylinders of size growing 
logarithmically with the size of the sample. We prove a central limit theorem for the first one. 
We also prove that the second one does not have Gaussian fluctuations. This solves a problem 
formulated in losifescu (1965). 
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1 Introduction 



This paper considers the following question. Suppose we have a long string of symbols produced 
by a chain of infinite order. How does the empirical entropy estimated from the sample fluctuate 
around the true entropy of the chain? 

By a chain of infinite order, we mean a stationary stochastic process in which at each step the 
probability governing the choice of a new symbol depends on the entire past. We will assume that 
this dependency on the past decreases exponentially fast. We will also assume that the symbols 
belong to a finite alphabet. 

We consider two definitions for the empirical entropy. The first one is the conditional entropy 
of the empirical distribution, given a past with length growing logarithmically with the size of the 
sample. The second one is the renormalized entropy of the empirical distribution of cylinders. In 
this case also the size of the cylinders entering in the definition of the empirical entropy grows 
logarithmically with the size of the sample. 

In his 1965 classical article, losifescu considered the second definition, in the simpler case in 
which the length of the cylinders remains constant and does not grow with the size of the sample. 
In this case, he proved that a central limit theorem holds. He observed at the end of the paper that 
the problem remained open when the size of the cylinders is an increasing function of the size of the 
sample. That is the situation we consider here. We prove that in this case, the empirical entropy 
defined in the second way does not have Gaussian fluctuations around the theoretical entropy of 
chain. 

In opposition to this negative result, we prove that the central limit theorem holds for the 
empirical conditional entropy. These two theorems are the main results of this paper. 

Chains of infinite order seem to have been first studied by Onicescu and Mihoc (1935) who 
called them chains with complete connections {chaines a liaisons completes). Their study was 
soon taken up by Doeblin and Fortet (1937) who proved the first results on speed of convergence 
toward the invariant measure. The name chains of infinite order was coined by Harris (1955). We 
refer the reader to losifescu and Grigorescu (1990) for a complete survey, and to Fernandez, Ferrari 
and Galves (2001) for an elementary presentation of the subject from a constructive point of view. 

Fluctuations of empirical entropies were already studied in Basarin (1959) for sequences of 
independent random variables and Markov Chains. For chains of infinite order, losifescu (1965) 
proved a central limit theorem for the density of entropy of the empirical k-marginals when k is 
fixed and does not change with n, the size of the sample. The convergence of the empirical entropy 
when k increases with n was proved in Ornstein and Weiss (1990) (see also Shields 1996). For a 
presentation of the problem of the estimation of the entropy from a physical point of view we refer 
the reader to Schurmann and Grassberger (1996). 

To prove our theorems we consider the following strategy. We decompose the difference between 
the empirical and the theoretical entropies as the sum of the relative entropy between the empirical 
and theoretical marginals plus a Birkhoff sum of some cylindric functions and a remainder. It turns 
out that the rate of convergence is different for each one of the empirical entropies we consider. 
The first one converges as the conditional entropy of the marginals of the source, while the second 
one converges as the density of entropy of the marginals which is much slower. The basis of both 
proofs is a Central Limit Theorem for cylinder functions with supports of increasing lengths. This 
theorem is interesting by itself. Its proof uses a regenerative construction of the chain of infinite 
order. 

The paper is organized as follows. Notation, definitions and the statement of the two main 
theorems are given in section |21 The proof of the Central Limit Theorem is given in section 3. 
In section 4 we prove the asymptotic normality of the fluctuations of the empirical conditional 
entropy. Finally, in section 5 we prove that the density of entropy of the empirical marginals 
cannot have Gaussian fluctuations except in the case of a sequence of independent and identically 
distributed random variables. 
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2 Definitions and main results 

Let A be a finite alphabet. Given x := {xi)^^■^ S and two integers m < n, the finite sequence 
{xm, a^n) of elements of A will be denoted x^. We will use also the symbol x"!^ with m or n or 
both infinite. A sequence x"_^ := {xi)-^Q will be called a history. Given two histories x^^ and 

y° oo, we say that x^^ = y'Lao^ if = Vi for all i = 0, —1, . . . , —m. 

Let X := be a stationary process taking values on the finite alphabet A and defined 

on a probability space (O,^, P). We denote v the law of the chain. This is the unique measure 
defined on A^ such that, 

K[aJ]) = P(Xf = aj), 
for any k and any sequence aj, where [aj^] denotes the cylinder set 

K] ■.= {xGA^:x-^=aZ,}. 

Let Pk be the marginal of size k of z^. It is a probability measure on A'' defined by 

Ukia'l) = u{[a'l]) . 

We will also use the shorthand 

ffe+i(ao|aIfe) = P(^o = ao\Xzl = aZi) ■ 
The extensive entropy of order k is defined as 

H{vk):=- ^ Vk{a'r)\ogVk{a^) ■ 

The conditional entropy of order k is defined as 

h{vk+i):=- ^ Vk+i{a^-k)\ogVk+i{ao\aZ\) , 

for /c > 1 and as 

h{vx) := - X! vi{aQ)\ogvi{ao) , 

aoeA 

for /c = 0. The entropy of the chain is defined as 

h{y) := lim ^-ff(z^fc) = Hm h(yt.) . 

The fact that the two limits in the definition coincide is a well known result (see for instance Shields 
1996). Whenever the reference measure u is clearly indicated by the context we will use the short 
notation 

Hk = H{iyk) , hk = h{vk+i) , h = h{v) . (2.1) 

In this paper we are concerned with the estimation of h, given a sample of the chain. Let 
a;" G A" be a finite sample of the chain X = {Xi)-^^ and take k <n. The empirical k-distribution, 
given the sample, is the measure on A'' defined as 

^ n—k+l 

n — k + l ^—i 

1=1 

for all & A^ , where l(a;^~'~'^~^ = a^) denotes the indicator function. We will also use the short 
notation 

i>fc,n(-) = Vk{-]X'^). 
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The extensive empirical entropy of order k is defined as 

and the conditional empirical entropy of order k is defined as 

We will assume that the process {Xn)^^^ satisfies the following conditions 
(A) For all a'l G A" 



(B) The hmit 



MO > 0. 



exists for all a°_^ e A ^. 
(C) There is a sequence (7m)mGN with 7^ \ when m — > +00 such that 

i^(aolalL) 



sup 

{a,b : a° =b° } 



In 



We remark that for i.i.d. random variables 70 = and for Markov chains of order fc > 1 
condition C is satisfied with 7^, = 0, for all m > k. We will call (X„)^g2 ^ chain of inGnite order 
if 7m > for any m. Whenever 7™ < Me~'^™ where M and c are suitable positive constants we 
will say that the chain loses memory exponentially fast with rate (at least) c. This will be always 
the case in this article. 

For any measurable function / : ^ R we will use the shorthand E [/] to denote E [/ (-'f^^)] ■ 
We will also use E [T^f] to denote E [/ (TdXt!^)] where Td denotes the d-step shift {Tdx)i = Xi+d, 
for d e Z. We define 



A direct computation shows that 



E 



-^f^(T,/-E[/])) 



m — 1 



m — 1 



a){m) = C/(0) + 2 ^ Cf{d) - - J] dCf(,d), 



(2.2) 



where 
If 

then 



Q(d)=E[/rrf/]-E[/]E[/] 

+ 00 

^dCf{d) < +00, 



+ CXD 



aj lim aj{m) = Q(0) + 2^C/(d). 



(2.3) 



(2.4) 



We recall that a chain of infinite order losing memory exponentially fast satisfies condition 12. 31 fcf. 

EEni). 

The main results of this paper are the following theorems. 
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Theorem 2.1. Let X — (Xi)_^^^ be a stationary process satisfying conditions A, B and C and 
losing memory exponentially fast with rate c > log|A|. Let us introduce the function (j){x) := 
— logz^(a;o|a;Z;!^). For any sequence (fc('^))„gN of positive integers such that liminf > and 
lim sup < 2iog \ A \ following statements hold. 
• Lf al > 0, then 




where means convergence in distribution. 
• If cr'^ = 0, then 

Vri{hk(n},n - /^) ^0, 

P 

where —^ means convergence in probability. 

Remark. In Theorem \2.1\ if X is a Markov chain of order m, then the lower bound for k[n) 
becomes simply liminf k{n) > m. 

Theorem 2.2. Let X = {Xi)^^^ be a stationary process satisfying conditions A, B and C and 
losing memory exponentially fast with 70 > and let (^('t-))^^^ be a sequence of positive real 
numbers. For any sequence {k{n))^^-^ of positive integers such that k{n) +00 when n +00 
and such that lim sup < jT^^^TaT' following statements hold. 



• // lim ^ = 0, then 
ji^oo k[n) 



• // lim \ = a, < a < +00, then 
n^oo k(n) 



1 - ^ 00 



li'n) [ -rr^Hy„,„- h] ^ aS^ihi - h) , (2.5) 



k{n) 

where h and hi are the entropies defined in \2.1\ 
qiri) 

• // lim 7-^ = +00, then for any r € R 
n^oo k[n) 

Jim P {g(n) {j^^HKn),n - /^) < r| = 0. 

Remarks. Theorem shows that, except for the i.i.d. case, the usual \fn scaling does not 
produce asymptotic normality; in fact, no scaling does. For the case of i.i.d. processes (70 — 0), 
Basarin (1959) proved that the Central Limit Theorem holds for the extensive empirical entropy 
when q{n) = (n/crp"'"/^. 

The theorem remains true if k{n) ^ K < +00 when n — > +00, with the only difference that the 
right hand side of Y2.5]) is replaced by ct^f^Q{hi — h). 

The proofs of Theorems 12.11 and 12.21 are based in the following Central Limit Theorem which 
is interesting by itself. Before stating this theorem we need some extra notation. 
Given a function 
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we define its m-variation as 

v„M)-= sup \f{a)~f{b)\ 

{a.b : a°_^=bO_^} 

Given a function / : — s- R we define its uniform norm as 

= sup^eA^\f{x)\ ■ 



Theorem 2.3. Let X = {Xi)^^^ be a stationary process satisfying conditions A, B and C and 
losing memory exponentially fast, and such that cr^ > 0, where is defined as in Theorem \2.1\ Let 
also {fk)kef>s be a family of uniformly bounded cylindric functions fk{x) = fk{x^-k) ■ A''^^ — > R, 
satisfying the conditions 

lim !!</>- M|oo = (2.6) 

k — ' + 00 

and 

sup Vk ifm) < Mjk, , (2.7) 
for all k € N, where M is a positive constant. 

Then for any sequence ik{n))^^-^ of positive integers such that k{n) +oo when n +oo and 
such that lim„^+oo ^^""^ = 0, then we have 



n 

= E ~ E [fkin)] } ^ AA(0, 1) 



lain ,=1 



We will use the letter M to denote all the constants appearing in this paper, not only the 
constant appearing in condition l2.7l 

3 Proof of Theorem EISl 

We start the proof with a lemma on the variances associated with the functions fk- 

Lemma 3.1. Let X — (^i)^^^ stationary process and {fk)k<zfi sequence of cylindric 

functions and let us assume that both satisfy the hypotheses of Theorem \2.Sl Then for any sequence 
(fc(ri))^gpj diverging to +oo when n oo, we have 

Proof. Using expressions 12.21 and 12 . 41 we observe that 



+C50 (-> n — 1 

+2j2\C^{d)\ + -J2d\Cf,^^^{d)\. (3.1) 



n 

d—n d—1 



By direct computation, we have for any d > 

\CM„,{d) ~ C^{d)\ < M||/,(„) - (3.2) 
^From this and the condition l2.6l it follows that 

lim |Q,,„,(0)-C^ (0)1 = 0. 
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Let now (5('T-))„gN be a diverging sequence of positive integers such that 

hm - </)||oo = . (3.3) 

n — >oo 

The second term in the right hand side of l3.1l can be decomposed as 

n-l n-1 

2El^/M.)W-^^WI = 2 5]l^/.(.)W-^^('^)l + 2 E - C4d)\ (3.4) 

d=l d=l d=g{n) + l 

It foUows directly from 13 . 2l and 13.31 that the first term in the right hand side of 13. 41 converges to 
as n diverges. 

To obtain an upper bound for the remaining terms in 13. H and in 13. 41 we use the well known fact 
that under the conditions of Theorem 12 . 31 there exist two positive constants M and ij such that 

|C/,(d)| < Me-'"' and \C^{d)\ < Afe""'^ . 

This follows for instance from the main Theorem of |BFG| . This concludes the proof of the lemma. 



The proof of the Central Limit Theorem uses the fact that under the hypotheses of Theorem 
12.31 the chain can be constructed with a regenerative scheme. A regenerative scheme is a coupled 
construction of the chain X — [Xij^^j^ together with a stationary renewal process {^ij^^^ in such 
a way that the random strings 

{{.Xj]Ti <j< r,+i)}.g2 

are independent and, excepted for i — —1, identically distributed. Here we adopt the convention 
that 

T-l < < To . 

The variables Ti are called regeneration times of the chain. The renewal process can be constructed 
in such a way that the distance between any two successive regeneration times has all its moments 
finite. For a detailed and self-contained presentation of this type of construction the reader is 
referred to |FFGj . or to the original papers \Ber\ . jCFF| and jLalj . 

Given a positive integer s, the decimated renewal process {T^)ii£z is defined as 



for every j G Z. In what follows we will take s — s{n) such that 

lim -V^ = and lim = , 



s(n) 



n^oo n 



(3.5) 



this is possible for the hypothesis on the sequence (A;(n))neN- From now on, whenever there is no 
danger of confusion, we will write s instead of s{n) and k instead of k{n). We define 



3 = Tf+k 



This definition makes sense when s{n) > k{n) + 1 which is always the case for n big enough. We 
recall that with the regenerative scheme the random strings 



{(X,;r/<j<r,^Vi)} 



are independent and identically distributed. As a consequence the mean zero random variables 
(S'i(n))jgpj are independent and identically distributed. Using the Berry- Esseen's inequality we will 
show that they satisfy the Central Limit Theorem. 
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We rewrite 



N{n) 



i=l 



i=0 



where 



+00 



N{n) :=^l(r;<n), 



and R{n) is the remainder. 

The remainder can be written as 



N{n) 



R{n) ^ R,{n) + Q{n), 



(3.6) 



where 



E 



r/ + fc-l 



and Q{n) is the remainder of the remainder. 
Lemma 3.2. Under the hypotheses of Theorem 

R{n) P 



we have 
0. 



Proof. When n is big enough also the random variables {Ri{n))^^^ are i.i.d.. Using Wald's 
identity and Chebychev's inequality we have 



>5 \ < 



E[A^(n)]E {R^{n)y 



5h 



(3.7) 



To conclude that the right hand side of inequality |^7| converges to we first observe that by the 
renewal property 



hm E 



N{n)s{n) 



1 



E [n - to] 



> 0. 



Therefore there exists a positive constant M such that 

E [N{n)] < M 



s{n) 



(3.8) 



(3.9) 



for all n big enough. Since the functions /j. are uniformly bounded above there exist a positive 
constant M such that 



E 



{R^in)) < Mk{nf 



(3.10) 



Putting together inequalities 13 .91 and 13 . 101 we conclude that the right hand side of ineaualitv l3.7l is 
bounded above by M "^^-^ where M is a suitable positive constant. One of the conditions satisfied 
by the sequences fc(n) and s{n) implies that this upper bound converges to as n diverges. 
The proof that 

Q(n) P 0^ 



is analogous. This concludes the proof of the lemma. 
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To conclude the proof of the Theorem we must show that 



(3.11) 



The law of large numbers implies that 



N{n) 



[E[7V(n)]] 



where [E [iV(n)]] denotes the integer part of the expectation E [N{n)]. Therefore, by a standard 
argument (see, for instance Theorem 17.1 from Billingsley's classical treatise |B ill I. IXTTI will follow 
from 

^[E[W(«)]] c /„^ ^ 



To prove IXT^ before using the Berry-Esseen's inequality we need to show that the scaling factor 
appearing in the statement of the Theorem has the right asymptotic behavior. This is the content 
of the next lemma. 



(3.13) 



Lemma 3.3. With the conditions of Theorem \2.'A the following limit holds 



Var 



lim 



■n — *-+oo " 



where Var [•] denotes the 



variance. 



Proof. Using the fact that Si{n) are mean zero independent random variables together with 13.81 
it is easy to see that expression 13 . 1 31 is equivalent to 



E 



limn-,+c 



iSi{n)y 



s(n)E[ri-ro] 

To prove 13.141 we first recall that Lemma 13.11 assures that 

al = lim aj (n) . 

Using mn we rewrite the expression inside the limit in the right-hand side of 13. 151 as 



E 



{S,{n)y 



where 



s(n)E [ti - To] 



n{n) 



1 + n{n) 



E [R{nf 



2E 






R{n) 


E[N{n) + 1]E 





(3.14) 



(3.15) 



(3.16) 



(3.17) 



and /„ is a sequence converging to 1 when n diverges. We observe that 

A proof of this fact, ignoring the remainder Q{n), is given in Lemma 13.21 The remainder can be 
treated in a similar way. 
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Using the Cauchy-Schwartz inequality and Wald's identity we obtain the upperbound 

Mk{n) 



\n{n)\ < M 



\ 



E 


\Riinyf_ 


E 


'iSiin))\ 



< 



E 



(3.18) 



where M is a suitable positive constant. 
If 

lim 

n — *+oo 



k{n) 



E 



{S,{n)Y 



then the proof would be concluded. Let us assume that this is not the case. Then for an e > 
there exist infinitely many values of n such that 



fc(n)>e^E^(5i(n)) 

However using this inequality inside 13.161 would led to a result which is in contradiction with 
Lemma H-t. II This concludes the proof of the lemma. ■ 



To conclude the proof of Theorem 12. 31 we will show that 



lim sup 



n — >+oo 



= 0, 



(3.19) 



where 



By Lemma [3.31 



*(r) 



V2^ J- 



E[iV(n)]] 



E 



{S,{n)f [E[iV(n) + l]] 



where Z„ ^ 1, when n — > +oo. Therefore the left hand side of expression 13 . 1 91 can be rewritten as 



sup 



< r/l„ } --^ir) 



{S^{n)f [E[iV(n) + l]] 



(3.20) 



We modify the expression inside the absolute value of 13. 201 by adding and subtracting ip{r/ln) and 
then using the triangle's inequality. The second term of the upper bound obtained this way is just 

supl^'(r) - *(r/?„)l 

reti 

which goes to as n diverges. Now we finally use the Berry-Esseen Theorem in the remaining 
term to obtain the upper bound 



sup 

rGR 



P < 



Si{n) 



/E 



< r//„ } - *(r//„) 



ME 



< 



\Siin)f 



E 



(3.21) 
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We refer the reader to the classical treatise of Shiryaev |Sh| for a presentation of the Berry- Esseen 
Theorem. 

The fact that the functions fk(n) ^re uniformly bounded provides an upper bound for the 
numerator of l3.21l 



E 



Si{n)\'' <Ms{nY 



where M is a positive constant. From ISTFl we obtain the lower bound 

[E [N{n) + 1]]>M 



s{n) ' 



(3.22) 



(3.23) 



where M is another positive constant. Finally as a byproduct of Lemma 13.31 we obtain the lower 
bound 



E 



{Si{n)y >Ms{n) 



(3.24) 



where M is a third positive constant. Putting together the bounds [3.21113.21^1 13.231 and 13.241 we 
conclude that 



sup 

rGK 



E 



is^{n)y 



[E[iV(n) + l]] 



< M 



By hypothesis this last upper bound goes to as n diverges, and this concludes the proof of the 
Theorem. 

4 Proof of theorem 12.11 

We first consider the case ti^ > 0. Let us introduce the cylinder functions (f)k{x) — (t)k{x''!_i^) defined 
by 0fc(x) := - logi/fc+i(a;o|a;I^) and define 

D{n) = y/n {hk(7i),n " ^) " ^ X! {^^'t>k{n) [x] - E [(/)&(„)]) . 

^ i—l 

It is easy to see that the sequence of functions {(t)k)k>i satisfies the conditions of Theorem 12.31 
Therefore if it? > 0, it follows that 



1 " 

^ {^^-^feC") - E [0fe(„)] } ^ AA(0, 1) . 



Therefore to prove Theorem 12. II it is enough to show that 

D{n) 4 0, 



(4.1) 



as n diverges. To prove UTI we will decompose D(n) in three parts and show that each one of them 
converges to in probability. The parts are defined as follows 

Di{n) = Vn{hk(n) ~ h) , 

D2[n) = y/n [H{i)k(n),nWkin)) - ('^fc(n) + l,« I J^fe(n) + 1 )] , 



where 



H{i)k,7iWk) = y^i>fc.«(Qi)log 



'>k,n{ai 
i^kiaf) 
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and 



^ — ^ \ ^ 

= ~ kin) ^ T,(/.fc(„)(a:) - -=^r,(/.fc(„)(a:), 

' i=k(n) + l ^ i=l 



The expert reader has already identified H{i'k n\i^k) as the relative entropy between the probability 
measures Vk,n and Vk on . 

Lemma 4.1. Let X he a stationary chain satisfying conditions A, B and C and losing memory 
exponentially fast with rate at least c, where c is any fixed positive real number. For any sequence 
(fc(n))„gN of positive integers such that 

liniinfMl!! > 1 (4.2) 



Proof. By Jensen's inequality 
By definition 



lim Di(n) = 0. 

n— i' + OO 



hk-h>Q. (4.3) 



h,-h= I oo) (log "^"f^-lj . (4.4) 



By standard calculus 



Ma'-oo) log , I -1, < / dv{a-oo) , I -1, - 1 ■ (4-5) 



It follows from hypothesis C that 



d'^(a-oo) , I -1, - 1 ^ 7fe ■ (4.6) 



By hypothesis 

7fc < Mexp{-cA:}, (4.7) 

where M is a strictly positive constant. Using together expressions 14.31 [4.41 14.51 |4.6I and |4.7I we 

obtain 

< Di{n) < My^exp{-ck{n)} . 
The conclusion of the lemma now follows directly from 14.21 ■ 

p 

Before proving that D2{n) ^ we need the following lemma. 

Lemma 4.2. If X is a stationary chain satisfying conditions A, B , C and losing memory expo- 
nentially fast then 



sup sup I — ^, I < +00 . 



where l[a5^] — l(x^ = a\) denotes the indicator function of the cylinder set \a\]. 
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Proof. From we obtain: 

n-l 
d=l ^ 

We need an upper bound for the right hand side of 14. 81 The (/i-mixing property of the chain iniphes 
that 

|Ci[,.j(d)|<MK[«t])e-'"^. (4.9) 

where rj and M are suitable positive constants. A constructive proof of this can be performed 
using the regenerative construction which was introduced in the proof of Theorem l2.3l fcf. [FFGp . 
To conclude the proof it is enough to use 14.91 in the right hand side of l4.8l ■ 



C^i[4](0) +2^] 1 



Cl[a^l(c^) 



(4.8) 



Lemma 4.3. Let X be a stationary chain satisfying conditions A, B and C and losing memory 
exponentially fast. For any sequence {k{n))„izn of positive integers such that 



kin) 1 
lim sup < 



logn 2 log 1^1 



(4.10) 



Proof. By definition 



i'k,n{a\) 



Vk{al) 



HiDk^nWk) = ^ h,7i{ai) log 
Therefore by Jensen's inequality < H(i'k.n\vk)- This allows us to use Markov's inequality to get 

P |-H"(j>fe(«),nkfc(«)) > < [H{vk(n).,n\vk(n))\ , (4.11) 

for any (5 > 0. 

We need an upper bound for the expectation on the right hand side of ineaualitv l4.11l We use 
again Jensen's inequality to obtain 



H{vk,n\vk) < log < ^ 



We observe that 



E 



(j>fe.n(a?))' 



Vkial) 



1 + E 



i^k{ai) 



(i>fc,K(ai) - i^kjai))^ 
J^fe(a^) 



Therefore from 14. Inl and standard calculus it follows that 

(i>fc,«(af) - i^fc(Qi))^ 



H{i'k,n\i'k) < 



i^kiai) 



(4.12) 



(4.13) 



Using the definition of £'k,m we can rewrite the right hand side of 14. 131 as 

1 1 f 1 n-k+l 

— L^y_L^J y (i{xr^>'-^ ^ aD - m4 



(4.14) 
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Therefore from 14.131 and 14.141 it follows that 



1 cr^f k(,^)An - k{n) + 1) 
E [i?(^.(„).„|..H)] < . E ' • (4.15) 

n K[n) -f- i ^^^^ t'fc(«)(ai ) 



By Lemma ^21 the right hand side of l4.15l is bounded above by 



n — k{n) + 1 ' 

where M is a positive constant. Putting together inequalities 14. 1 II 14. 1 51 ITTBI we conclude that 



(4.16) 



< 



5{n~k(n) + 1) 



By hvpothesis 14. 101 this last upper bound converges to as n diverges. This concludes the proof 
of the lemma. ■ 

Lemma 4.4. From the hypotheses of Lemma\4-3\ it follows that 



Z?3(n)^0. 

Proof. The result follows immediately from the fact that functions 0^, fc S N are uniformly 
bounded. I 

This concludes the proof of Theorem l2.1l in the case cr^ > 0. 

Let us finally consider the case ct^ = 0. With our assumptions, the variance vanishes if and 
only if the chain is an i.i.d. sequence of random variables with 

P{X„ = a} = ^, 

for any a Cz A. In this case 

(^fc(x) =0(x) =log|A|, 
for any x and all k. This implies that 

1 " 

, {7i0fc(n) - E [0fe(n)] } = 

and the conclusion of the second part of Theorem 12.11 follows trivially. 

5 Proof of theorem 12.21 

We follow the same approach as for the proof of Theorem 12. II We define 

1 



W{n) = q{n) 



k{n 



k{n),n 



l{ ) ^ jr-^j.^^^ _ E[0fc(„)]} , (5.1) 



n 

i=l 



where the functions 4>k are cylinder functions defined by 



1 

— T 

+ 1 ^ 



4=0 
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and 4>k{x) — are the cylinder functions introduced in the proof of Theorem l2.1l and defined 

by (t)k{x) ■■= -^ogiyk+i{xo\xZl), when fc > 1, and (t)o{x) := - log i^i(a;o). 
We wiU decompose W{n) in three parts defined as foUows 

Wi{n) = q{n) (^^^-^^^(n) ^ 

W2{n) = -||^i?(i>/c(n),n|l'fe(n)) , 

and 

W3{n) ^W{n)- Wi{n) - W^aN • 

Let us briefly sketch the proof. The functions (f)k satisfy the conditions of Theorem 12.31 and 
by assumption we have a'^ > 0. Therefore if we take q{n) = ^fn, then the second term of right 
hand side of 15. II converges to a normal distribution with variance cr^. At this point we could try to 
reproduce the proof of Theorem 12 . II and to show that the remainder W(n) vanishes as n diverges. 
This would imply that the first term of right hand side of l5.1l is asymptotically normal. 

The second and the third terms of the remainder both converge to in probability, when 
— \fn. However, with this choice of q(n), the term W\{n) diverges to +oo. 

That's the end of the history. There is no scaling factor which simultaneously assures asymptotic 
normality for the second term in the right-hand side of 15. II and convergence to for the remainder 
term. This impossibility is due to the slow rate of convergence of the extensive entropy, much 
slower than the rate for the conditional entropy. This is content of the next lemmas. In all of them 
we will assume that X is a chain satisfying the hypotheses of Theorem 12.21 

Lemma 5.1. For any sequence {k{n))^^^-^ of positive integers such that k{n) —f +oo when n — > -|-c» 
the following statements hold. 



If lim 1^ = 0, then 

n^oo k[n) 



lim Wi{n) = Q. 

n — )-4-oo 



// lim 77-4- — a, < a < +oo, then 



-t^oo k{n) 



with < E»=^(^i -h) <+oo. 

• If lim \ = +00, then 
n— too k(n) 



+ 00 

lim Wi[n) = a}^ (hi — h) . 

i=0 



lim Wi(n) = +00 . 

n — t-f oc 



Proof. By definition, we have that 

fe(n)-l 

E ('..-'■)- 

^ ^ i=Q 

Moreover, Jensens inequality implies that J^t^i^i — h) > 0. This is actually a strict inequality, 
since, by assumption, 70 > and therefore the chain X is not a sequence of independent random 
variables. The fact that X]£o (^i — h) < +00 follows easily from expressions 14 . 41 1131 and ITBl The 
lemma follows immediately from these facts. I 
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Lemma 5.2. For any sequence k{n) satisfying the hypotheses of Theorem \2.'A and for any sequence 
q{n) such that q{n) < Mk{n)y/n, with M a positive constant we have that 

as n diverges. 

Proof. The proof is identical to the proof of Lemma [4. 31 ■ 
Lemma 5.3. There exists a positive constant M such that 

n 

Proof. The result follows immediately from the fact that the functions (pk, fc G N, are uniformly 
bounded. ■ 

Lemmas 15.11 15.21 and 15.31 toerether with Theorem 12.31 imply that Theorem 12 . 21 holds for sequences 
q{n) bounded above by Aly/n, where M is a positive constant. 

Lemma 5.4. If \imsnp^^^_^^ q{n)/^/n = +oo andlmin^^ooQi'n')/k(n) — +oo, then for any r > 
we have 

lim pL(n) (!^M;i>2l^h] >r| =1. 
n^+oo 1 \ k(n) I 

Proof. By hypothesis for any positive M, we have q{n) > My/n for infinitely many values of n. 
Let us define 

q{n) = mm{q{n), M\fn\ . 

Obviously we have 

=>{'<"'(%f-'')>^)^-{'<"'(%f-')>| i^-^) 

The part already proved of Theorem l2 . 2l implies that the right hand-side of ineaualitv IS . 2l converges 
to 1. This concludes the proof of Lemma [5. 21 as well as the proof of Theorem 12. 21 ■ 
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